219 research outputs found

    Enhanced differential expression statistics for data-independent acquisition proteomics

    Get PDF
    We describe a new reproducibility-optimization method ROPECA for statistical analysis of proteomics data with a specific focus on the emerging data-independent acquisition (DIA) mass spectrometry technology. ROPECA optimizes the reproducibility of statistical testing on peptide-level and aggregates the peptide-level changes to determine differential protein-level expression. Using a 'gold standard' spike-in data and a hybrid proteome benchmark data we show the competitive performance of ROPECA over conventional protein-based analysis as well as state-of-the-art peptide-based tools especially in DIA data with consistent peptide measurements. Furthermore, we also demonstrate the improved accuracy of our method in clinical studies using proteomics data from a longitudinal human twin study

    Statistical and machine learning methods to study human CD4+ T cell proteome profiles

    Get PDF
    Mass spectrometry proteomics has become an important part of modern immunology, making major contributions to understanding protein expression levels, subcellular localizations, posttranslational modifications, and interactions in various immune cell populations. New developments in both experimental and computational techniques offer increasing opportunities for exploring the immune system and the molecular mechanisms involved in immune responses. Here, we focus on current computational approaches to infer relevant information from large mass spectrometry based protein profiling datasets, covering the different steps of the analysis from protein identification and quantification to further mining and modelling of the protein abundance data. Additionally, we provide a summary of the key proteome profiling studies on human CD4+ T cells and their different subtypes in health and disease

    Integrative omics approaches provide biological and clinical insights : examples from mitochondrial diseases

    Get PDF
    High-throughput technologies for genomics, transcriptomics, proteomics, and metabolomics, and integrative analysis of these data, enable new, systems-level insights into disease pathogenesis. Mitochondrial diseases are an excellent target for hypothesis-generating omics approaches, as the disease group is mechanistically exceptionally complex. Although the genetic background in mitochondrial diseases is in either the nuclear or the mitochondrial genome, the typical downstream effect is dysfunction of the mitochondrial respiratory chain. However, the clinical manifestations show unprecedented variability, including either systemic or tissue-specific effects across multiple organ systems, with mild to severe symptoms, and occurring at any age. So far, the omics approaches have provided mechanistic understanding of tissue-specificity and potential treatment options for mitochondrial diseases, such as metabolome remodeling. However, no curative treatments exist, suggesting that novel approaches are needed. In this Review, we discuss omics approaches and discoveries with the potential to elucidate mechanisms of and therapies for mitochondrial diseases.Peer reviewe

    Computational deconvolution to estimate cell type-specific gene expression from bulk data

    Get PDF
    Computational deconvolution is a time and cost-efficient approach to obtain cell type-specific information from bulk gene expression of heterogeneous tissues like blood. Deconvolution can aim to either estimate cell type proportions or abundances in samples, or estimate how strongly each present cell type expresses different genes, or both tasks simultaneously. Among the two separate goals, the estimation of cell type proportions/abundances is widely studied, but less attention has been paid on defining the cell type-specific expression profiles. Here, we address this gap by introducing a novel method Rodeo and empirically evaluating it and the other available tools from multiple perspectives utilizing diverse datasets.</p

    Estimating cell type-specific differential expression using deconvolution

    Get PDF
    When differentially expressed genes are detected from samples containing different types of cells, only a very coarse overview without any cell type-specific information is obtained. Although several computational methods have been published to estimate cell type-specific differentially expressed genes from bulk samples, their performance has not been evaluated outside the original publications. Here, we compare accuracies of nine of these methods, test their sensitivity to various factors often present in real studies and provide practical guidelines for end users about when reliable results can be expected and when not. Our results show that TOAST, CARseq, CellDMC and TCA are accurate methods with their own strengths and weaknesses. Notably, methods designed to detect cell type-specific differential methylation were comparable to those designed for gene expression, and both types outperformed methods originally designed for other tasks. The most important factors affecting the accuracy of the estimated cell type-specific differentially expressed genes are (i) abundance of the cell type (rare cell types are harder to analyze) and (ii) individual heterogeneity in the cell type-specific expression profiles (stable cell types are easier to analyze)</p

    A systematic evaluation of normalization methods in quantitative label-free proteomics

    Get PDF
    To date, mass spectrometry (MS) data remain inherently biased as a result of reasons ranging from sample handling to differences caused by the instrumentation. Normalization is the process that aims to account for the bias and make samples more comparable. The selection of a proper normalization method is a pivotal task for the reliability of the downstream analysis and results. Many normalization methods commonly used in proteomics have been adapted from the DNA microarray techniques. Previous studies comparing normalization methods in proteomics have focused mainly on intragroup variation. In this study, several popular and widely used normalization methods representing different strategies in normalization are evaluated using three spike-in and one experimental mouse label-free proteomic data sets. The normalization methods are evaluated in terms of their ability to reduce variation between technical replicates, their effect on differential expression analysis and their effect on the estimation of logarithmic fold changes. Additionally, we examined whether normalizing the whole data globally or in segments for the differential expression analysis has an effect on the performance of the normalization methods. We found that variance stabilization normalization (Vsn) reduced variation the most between technical replicates in all examined data sets. Vsn also performed consistently well in the differential expression analysis. Linear regression normalization and local regression normalization performed also systematically well. Finally, we discuss the choice of a normalization method and some qualities of a suitable normalization method in the light of the results of our evaluation.</p

    Introducing untargeted data-independent acquisition for metaproteomics of complex microbial samples

    Get PDF
    Mass spectrometry-based metaproteomics is a relatively new field of research that enables the characterization of the functionality of microbiota. Recently, we demonstrated the applicability of data-independent acquisition (DIA) mass spectrometry to the analysis of complex metaproteomic samples. This allowed us to circumvent many of the drawbacks of the previously used data-dependent acquisition (DDA) mass spectrometry, mainly the limited reproducibility when analyzing samples with complex microbial composition. However, the DDA-assisted DIA approach still required additional DDA data on the samples to assist the analysis. Here, we introduce, for the first time, an untargeted DIA metaproteomics tool that does not require any DDA data, but instead generates a pseudospectral library directly from the DIA data. This reduces the amount of required mass spectrometry data to a single DIA run per sample. The new DIA-only metaproteomics approach is implemented as a new open-source software package named glaDIAtor, including a modern web-based graphical user interface to facilitate wide use of the tool by the community.</p

    Exon-level estimates improve the detection of differentially expressed genes in RNA-seq studies

    Get PDF
    Detection of differentially expressed genes (DEGs) between different biological conditions is a key data analysis step of most RNA-sequencing studies. Conventionally, computational tools have used gene-level read counts as input to test for differential gene expression between sample condition groups. Recently, it has been suggested that statistical testing could be performed with increased power at a lower feature level prior to aggregating the results to the gene level. In this study, we systematically compared the performance of calling the DEGs when using read count data at different levels (gene, transcript, and exon) as input, in the context of two publicly available data sets. Additionally, we tested two different methods for aggregating the lower feature-level p-values to gene-level: Lancaster and empirical Brown's method. Our results show that detection of DEGs is improved compared to the conventional gene-level approach regardless of the lower feature-level used for statistical testing. The overall best balance between accuracy and false discovery rate was obtained using the exon-level approach with empirical Brown's aggregation method, which we provide as a freely available Bioconductor package EBSEA (https://bioconductor.org/packages/release/bioc/html/EBSEA.html)

    Likelihood contrasts: a machine learning algorithm for binary classification of longitudinal data

    Get PDF
    Machine learning methods have gained increased popularity in biomedical research during the recent years. However, very few of them support the analysis of longitudinal data, where several samples are collected from an individual over time. Additionally, most of the available longitudinal machine learning methods assume that the measurements are aligned in time, which is often not the case in real data. Here, we introduce a robust longitudinal machine learning method, named likelihood contrasts (LC), which supports study designs with unaligned time points. Our LC method is a binary classifier, which uses linear mixed models for modelling and log-likelihood for decision making. To demonstrate the benefits of our approach, we compared it with existing methods in four simulated and three real data sets. In each simulated data set, LC was the most accurate method, while the real data sets further supported the robust performance of the method. LC is also computationally efficient and easy to use.</p

    Integrating probe-level expression changes across generations of Affymetrix arrays

    Get PDF
    There is an urgent need for bioinformatic methods that allow integrative analysis of multiple microarray data sets. While previous studies have mainly concentrated on reproducibility of gene expression levels within or between different platforms, we propose a novel meta-analytic method that takes into account the vast amount of available probe-level information to combine the expression changes across different studies. We first show that the comparability of relative expression changes and the consistency of differentially expressed genes between different Affymetrix array generations can be considerably improved by determining the expression changes at the probe-level and by considering the latest information on probe-level sequence matching instead of the probe annotations provided by the manufacturer. With the improved probe-level expression change estimates, data from different generations of Affymetrix arrays can be combined more effectively. This will allow for the full exploitation of existing results when designing and analyzing new experiments
    • …
    corecore